rnn model
On Multiplicative Integration with Recurrent Neural Networks
We introduce a general simple structural design called "Multiplicative Integration" (MI) to improve recurrent neural networks (RNNs). MI changes the way of how the information flow gets integrated in the computational building block of an RNN, while introducing almost no extra parameters. The new structure can be easily embedded into many popular RNN models, including LSTMs and GRUs. We empirically analyze its learning behaviour and conduct evaluations on several tasks using different RNN models. Our experimental results demonstrate that Multiplicative Integration can provide a substantial performance boost over many of the existing RNN models.
HitNet: Hybrid Ternary Recurrent Neural Network
Quantization is a promising technique to reduce the model size, memory footprint, and massive computation operations of recurrent neural networks (RNNs) for embedded devices with limited resources. Although extreme low-bit quantization has achieved impressive success on convolutional neural networks, it still suffers from huge accuracy degradation on RNNs with the same low-bit precision. In this paper, we first investigate the accuracy degradation on RNN models under different quantization schemes, and the distribution of tensor values in the full precision model. Our observation reveals that due to the difference between the distributions of weights and activations, different quantization methods are suitable for different parts of models. Based on our observation, we propose HitNet, a hybrid ternary recurrent neural network, which bridges the accuracy gap between the full precision model and the quantized model. In HitNet, we develop a hybrid quantization method to quantize weights and activations. Moreover, we introduce a sloping factor motivated by prior work on Boltzmann machine to activation functions, further closing the accuracy gap between the full precision model and the quantized model.
- North America > United States (0.14)
- Asia > Singapore (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.98)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.71)
- North America > United States > California > Santa Barbara County > Santa Barbara (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > Greece > Attica > Athens (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
other comments in the paper if accepted
We appreciate the valuable comments from the reviewers. We will answer reviewers' questions from three aspects, i.e., In respond to Reviewer 5, this paper's major novelty is developing a new STL-based learning framework to Our method creates a practical way to ensure the logic rules' satisfaction in an end-to-end manner. Our approach achieves promising results on real city datasets, i.e., significantly We have carefully compared our work with all the related papers pointed out by the reviewers. Therefore, we also choose STL to express the model properties. Using STL to specify CPS properties is not our novelty.
- North America > United States (0.05)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
On Multiplicative Integration with Recurrent Neural Networks
We introduce a general simple structural design called "Multiplicative Integration" (MI) to improve recurrent neural networks (RNNs). MI changes the way of how the information flow gets integrated in the computational building block of an RNN, while introducing almost no extra parameters. The new structure can be easily embedded into many popular RNN models, including LSTMs and GRUs. We empirically analyze its learning behaviour and conduct evaluations on several tasks using different RNN models. Our experimental results demonstrate that Multiplicative Integration can provide a substantial performance boost over many of the existing RNN models.
LightRNN: Memory and Computation-Efficient Recurrent Neural Networks
Xiang Li, Tao Qin, Jian Yang, Tie-Yan Liu
While RNNs are becoming increasingly popular, they have a known limitation: when applied to textual corpora with large vocabularies, the size of the model will become very big. For instance, when using RNNs for language modeling, a word is first mapped from a one-hot vector (whose dimension is equal to the size of the vocabulary) to an embedding vector by an input-embedding matrix.
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Asia > China > Jiangsu Province > Nanjing (0.04)
HitNet: Hybrid Ternary Recurrent Neural Network
Quantization is a promising technique to reduce the model size, memory footprint, and massive computation operations of recurrent neural networks (RNNs) for embedded devices with limited resources. Although extreme low-bit quantization has achieved impressive success on convolutional neural networks, it still suffers from huge accuracy degradation on RNNs with the same low-bit precision. In this paper, we first investigate the accuracy degradation on RNN models under different quantization schemes, and the distribution of tensor values in the full precision model. Our observation reveals that due to the difference between the distributions of weights and activations, different quantization methods are suitable for different parts of models. Based on our observation, we propose HitNet, a hybrid ternary recurrent neural network, which bridges the accuracy gap between the full precision model and the quantized model. In HitNet, we develop a hybrid quantization method to quantize weights and activations. Moreover, we introduce a sloping factor motivated by prior work on Boltzmann machine to activation functions, further closing the accuracy gap between the full precision model and the quantized model.
- North America > United States (0.14)
- Asia > Singapore (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.98)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.71)